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The family Brassicaceae is one of the major groups of the plant kingdom and comprises diverse species of 
great economic, agronomic and scientific importance, including the model plant Arabidopsis. The sequencing 
of the Arabidopsis genome has revolutionized our knowledge in the field of plant biology and provides a foun- 
dation in genomics and comparative biology. Genomic resources have been utilized in Brassica for diversity 
analyses, construction of genetic maps and identification of agronomic traits. In Brassicaceae, comparative 
sequence analysis across the species has been utilized to understand genome structure, evolution and the 
detection of conserved genomic segments. In this review, we focus on the progress made in genetic resource 
development, genome sequencing and comparative mapping in Brassica and related species. The utilization 
of genomic resources and next-generation sequencing approaches in improvement of Brassica crops is also 
discussed. 
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Introduction 

Comparative genomics is the study of similarities and dif- 
ferences at the genomic level to make inferences about the 
functions and evolution of various biological processes. 
This is an important field to study genome evolution, se- 
quence collinearity and transfer of information from exten- 
sively studied model organisms to species of commercial 
interest. Genome sequencing of Arabidopsis, a member of 
the Brassicaceae family, has revolutionized our knowledge 
in every field of plant biology and laid a foundation for 
genomics and comparative biology. 

The family Brassicaceae is one of the major groups of 
the plant kingdom, comprising of 340-360 genera and over 
3,700 species distributed worldwide (Warwick et al. 2006). 
Many species within the family are of great economic, agro- 
nomic and scientific importance. Some examples of these 
include the following: Brassica napus and B. juncea (oil- 
seed crops); B. rapa (turnip, leaf vegetable); B. olercaea 
(cabbage, cauliflower, Kale, broccoli); Raphanus sativus 
(vegetable) and Arabidopsis thaliana (model plant). The six 
most cultivated species of the genus Brassica comprises the 
three diploid genomes of B. rapa (AA, 2n = 20), B. nigra 
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(BB, 2n =16) and B. oleracea (CC, 2n = 18) together with 
three amphidiploid species, B. juncea (AABB, 2n = 36), 
B. napus (AACC, 2n — 38) and B. carinata (BBCC, 2n = 
34). Cytogenetics and hybridization studies have demon- 
strated that amphidiploid species are natural hybrids of dip- 
loids and the six Brassica species are interlinked (Fig. 1, U 
1935). Genome evolution and comparative sequence analy- 
sis of Brassicaceae have also confirmed the interrelationship 
of the six Brassica species at the molecular level (Schmidt 
and Bancroft 2011). In Brassicaceae, genomic studies are 
mainly focused on cultivated Brassica and their diploid pro- 
genitor species, which are compared with the Arabidopsis 
genome. With the inception of the Multinational Brassica 
Genome Project (MBGP) in 2002, the international Brassica 
community agreed to develop more resources for Brassica 
crops and genome sequencing. In the last decade, significant 
advances have been made in generation of genomics re- 
sources and translational research, which will aid Brassica 
crop improvement (Augustine et al. 2013, Schmidt and 
Bancroft 2011). Recently, sequencing of B. rapa and ances- 
tral diploids of Brassica has expanded comparative geno- 
mic studies, providing resources for the identification of 
candidate genes of agronomic traits. Comparative mapping 
of Brassica species with the Arabidopsis genome helps in 
understanding conserved genetic architecture and genome 
evolution and the identification and functional analysis of 
genes for important agronomic traits. Genome-wide synteny 
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Fig. 1. Genomic relationships among six cultivated Brassica species 
represented by 'Triangle of U'. Adapted from UN, 1935. 



analyses between the Arabidopsis and Brassica A, B, and C 
genomes have identified conserved chromosomal blocks and 
elucidated genome rearrangements and karyotype diversifica- 
tion. 

Next-generation sequencing (NGS) techniques have 
been utilized to develop cost-effective and efficient methods 
for single nucleotide polymorphism (SNP) discovery, geno- 
typing and gene expression studies. In some Brassica 
species, these techniques have been used for the identifica- 
tion of SNP markers and the construction of linkage maps. 
Transcriptome analysis has also been used to find different 
gene expression profiles in response to abiotic and biotic 
stress and in understanding gene regulatory mechanisms. 
In this review, we emphasize the advancement of resource 
development in Brassicaceae, comparative mapping and 
the recent progress made in sequencing Brassica and 
related species. We focus on the genomics and genetic 
improvements made in six cultivated crops of Brassica 
and Raphanus. 

Linkage maps are a prerequisite of comparative map- 
ping 

Linkage maps are valuable sources for identification and 
map-based cloning of important genes, analyses of QTLs 
for agronomic traits, and comparative mapping. Linkage 
maps have been developed in almost all the major crops 
with the advancement of DNA markers, such as RFLP 
(Restriction Fragment Length Polymorphism), AFLP 
(Amplified Fragment Length Polymorphism) and sequence- 
based markers like SSR (Simple Sequence Repeats) and 
SNPs. Comparative studies of linkage maps between spe- 
cies are useful in predicting diversity, genome evolution and 
organization. A number of genetic linkage maps have been 
generated in Brassica, utilizing different sets of markers and 
mapping populations. Linkage maps provide a basis for ge- 
netic architecture analysis of the genome and sequence- 



based marker data facilitate comparative mapping and the 
study of genomic relations among species. 

In Brassica rapa, during the last decade, more than ten 
linkage maps have been developed mainly based on molec- 
ular markers (RFLP, RAPD, AFLP, and SSR) using differ- 
ent mapping populations (Kim et al. 2006, Lou et al. 2008, 
Sakamoto et al. 2008, Song et al. 1991, Suwabe et al 2004, 
2006, Wang et al. 2004), which has made comparative anal- 
ysis with each other difficult without a common reference 
map. Choi et al. (2007) constructed the first reference genet- 
ic map for B. rapa using doubled haploid lines derived from 
a cross between two diverse Chinese cabbage (B. rapa ssp. 
pekinensis) inbred lines, "Chiifu-401-42" and "Kenshin- 
402-43". The reference linkage map was updated with the 
addition of 156 BAC-end SSR markers (Kim et al 2009) 
and subsequently was used for high-density integrated map 
construction (Li et al. 2010). Recently, the genome of the 
B. rapa inbred line Chiifu-401-42 has been completely se- 
quenced under the Brassica rapa Genome Consortium 
(Wang, X. et al. 2011) and the reference linkage map has fa- 
cilitated assignment of sequence scaffolds to the chromo- 
somes. 

Brassica oleracea, representing the C genome of 
Brassica, comprises various vegetables, one of which, cab- 
bage (B. oleracea var. capitatd), has been considered for the 
genome sequencing project. In B. oleracea, more than ten 
linkage maps have been developed using RFLP, AFLP or 
SSR markers in different mapping populations (Iniguez-Luy 
et al. 2009, Okazaki et al. 2007, Schmidt and Bancroft 
2011). Integrated maps in B. olercaea have also been con- 
structed with RFLP and AFLP markers by Kianian and 
Quiros (1992) and Sebastian et al. (2002). Available ex- 
pressed sequence tag (EST) sequences of Arabidopsis and 
Raphanus have also been explored to construct several 
other maps and have allowed comparison of the B. oleracea 
genome with the Arabidopsis genome (Ashutosh et al. 
2012, Babula et al. 2003, Kifuji et al. 2013, Kowalski et al. 
1994, Lan et al. 2000). A high-density linkage map using 
Sequence-Related Amplified Polymorphism (SRAP) 
markers was developed in B. oleracea and identified QTLs 
of curd formation in cauliflower (Gao et al. 2007). In 
Brassica, 56,465 non-redundant SSR markers identified 
from B. oleracea whole-genome shotgun sequences were 
preferentially located on the C genome, and of these 752 
markers showed polymorphism among six B. napus varie- 
ties (Li, H. et al. 201 1). As the B. oleracea genome sequenc- 
ing project was launched, a high-density reference map was 
drafted including 602 SSRs and 625 SNP markers generated 
from whole-genome shotgun sequences by NGS, covering 
1197.9 cM (Wang, W. et al. 2012). This is also the first map 
that has allowed the assembled scaffold to be anchored to 
pseudochromosomes, which has significantly contributed to 
Brassica genome studies. 

Brassica nigra (BB), one of the diploid Brassica, has not 
been studied extensively at the genomic level relative to oth- 
er Brassica species despite a rich source of agronomically 
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important genes in terms of disease resistance, drought tol- 
erance and seed oil quality (Chevre et al. 1996, Sjodin and 
Glimelius 1989, Struss et al. 1996). In B. nigra, the first link- 
age map was developed by Truco and Quiros (1994) using 
isozymes, RFLP and RAPD markers. In total, 124 markers 
were assigned to 1 1 linkage groups covering a total distance 
of 677 cM. A comprehensive linkage map of B. nigra was 
constructed with 160 DNA probes from Arabidopsis and 
identified 284 homologous loci covering a 750 cM region 
(Lagercrantz 1998). 

Brassica napus (AACC), a major oilseed crop, is a high- 
ly accessed Brassica in terms of genetics and genomics. 
More than 30 linkage maps have been developed using vari- 
ous types of mapping populations and different molecular 
markers for different agronomic traits. The linkage maps in 
B. napus were developed using AFLP (Mei et al. 2009, Qiu 
et al. 2006, Radoev et al. 2008), RFLP (Parkin et al. 1995, 
Uzunova et al. 1995), SRAP (Sun et al. 2007), and SSR 
(Piquemal et al. 2005, Wang, J. et al. 2011) markers for 
developmental traits, seed quality and disease resistance. 
These maps have provided valuable information for rape- 
seed improvement and also in genome structure analysis. 
Recently, a high-density SNP linkage map, consisting of 
5,764 SNP and 1603 PCR markers, was developed by inte- 
grating four DH populations to detect polymorphism level 
and linkage disequilibrium across different collections 
(Delourme et al. 2013). 

Brassica juncea (AABB), one of the six cultivated 
Brassica, is the major oilseed crop of India. Relative to 
B. napus, genetic and genomic studies in B. juncea have 
been done less intensively, but in recent years the inter- 
national community has given more attention to B. juncea 
because of its resistance to salinity and seed shattering. Ear- 
ly linkage maps in B. juncea were developed with RFLP 
and AFLP markers to investigate various traits (Axelsson et 
al. 2000, Christianson et al. 2006, Pradhan et al. 2003). A 
high-density linkage map in B. juncea was developed using 
AFLP, RFLP, SSR and gene-based markers with a total of 
1,148 loci covering 1,840 cM of 18 linkage groups 
(Ramchiary et al. 2007). Although these linkage maps were 
useful for breeding and tagging of important traits, they pro- 
vided limited information for comparative mapping. 

Recent work on B. carinata (BBCC), one of the six culti- 
vated amphidiploids, suggests it has better adaptability and 
productivity in semi-arid and temperate areas compared to 
oilseed rape. Being resistant to various diseases and biotic 
stress, B. carinata is suitable to cultivate in temperate envi- 
ronments (Getinet et al. 1996) and is also a potential crop in 
biofuel production (Cardone et al. 2003). Although genetic 
diversity analysis of this species was carried out, limited 
work has been done on genomic studies. Recently, a linkage 
map of B. carinata has been constructed and 212 loci were 
assigned to seventeen linkage groups covering a region of 
1703 cM (Guo etal. 2012). 

Raphanus sativus (radish), a member of Barssicaceae, is 
used all over the world as a vegetable crop with an edible 



taproot. Although Raphanus is an economically important 
crop, genetic and genomic research has not progressed as in 
B. rapa and B. napus. A number of genetic maps have been 
developed in R. sativus using RFLP, AFLP, SSR and EST- 
SNP to analyze QTLs for disease resistance, root shape, 
flowering time, and pigmentation (Bett and Lydiate 2003, 
Budahn et al. 2009, Hashida et al. 2013, Kamei et al. 2010, 
Tsuro et al. 2005, Yu, X. et al. 2013, Zou, Z. et al. 2013). 
EST-based SNP and SSR were utilized to construct dense 
linkage maps and alignment of marker sequences to known 
Brassica sequences identified extensive chromosome homo- 
eology among Brassicacae (Li, F. et al. 2011, Shirasawa et 
al. 2011). The Brassica SSR and BAC-end sequence mark- 
ers have also been explored in R. sativus in identification of 
QTLs for Fusarium wilt resistance trait (Yu, X. et al. 2013). 

Comparative mapping for identification of conserved 
genomic segments 

Arabidopsis, a member of Brassicacaeae, is closely related 
to Brassica at the genomic sequence level, and shows 
around 85-90% identity in the exonic regions (Schmidt 
2002). The fact that molecular markers of a Brassica 
species are transferrable to other Brassica species helps 
comparative mapping studies between cultivated Brassica 
species and with A. thaliana, as well as other Brassicaceae 
crops. On the basis of comparative mapping studies be- 
tween A. thaliana and the ancestral karyotypes, 24 crucifer 
genomic blocks (A-X) have been proposed by Schranz et 
al. (2006), which are now widely accepted by scientific 
communities. By comparative genetic mapping between 
Arabidopsis and Brassica species, the presence of segmen- 
tal duplications and genome rearrangements of Brassica 

A, B and C genomes was proposed and confirmed at the 
micro or macro level (Navabi et al. 2013, O'Neill and 
Bancroft 2000, Parkin et al. 2005). Lukens et al. (2003) 
attempted a comparison of mapped RFLP probe sequences 
of B. oleracea with the Arabidopsis genome sequence and 
identified 34 genomic collinear regions. In B. oleracea, 
through cDNA or BAC sequence comparison with 
Arabidopsis and B. rapa, they identified conserved collin- 
earity for gene order and content of specific chromosomal 
segments (Li et al. 2003, Qiu et al. 2009). In B. napus, 
by sequencing of mapped RFLP probes and comparing 
these with the Arabidopsis genome, Parkin et al. (2005) 
identified 21 genomic blocks linked to the A and C ge- 
nomes. Most of these conserved segments were found in six 
copies, which confirm the proposed hexaploid ancestor for 
the diploid Brassica progenitors. These genomic segments 
could be duplicated and rearranged in the present-day 

B. napus genome. Panjabi et al. (2008) extended compara- 
tive work in B. juncea and used Arabidopsis-based poly- 
morphic intron PCR markers to identify conserved chromo- 
somal regions and evolutionary relationships of the A, B 
and C genomes of Brassica. BAC- and SSR-based linkage 
maps of B. rapa (Choi et al. 2007, Kim et al. 2009) were 
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adopted as references to anchor sequence contigs of the 
international B. rapa genome sequence project and facilitat- 
ed identification of conserved genomic blocks between 
Arabidopsis and B. rapa. The genome sequence of B. rapa 
provides an important resource for comparative mapping 
(BrGSP, Wang, X. et al. 2011). A conserved genomic restruc- 
turing in B. napus was confirmed by comparative mapping 
of dense linkage maps based on SSR and SNP markers with 
Arabidopsis and B. rapa (Bancroft et al. 2011, Wang, J. et 
al. 2011). A linkage map was developed in B. carinata and 
comparative mapping with the Arabidopsis sequence identi- 
fied conserved ancestral building blocks (Guo et al. 2012). 
Recently, B. nigra BAC libraries have been sequenced and 
compared with Arabidopsis chromosome 4 and homologous 
Brassica A and C genomes, identifying conserved collinear- 
ity for gene content and order (Navabi et al. 2013). Exten- 
sive chromosome sequence homoeology was also revealed 
in Raphanus by comparing an EST-SNP-based linkage map 
(Li, F. et al. 2011) and an EST-SSR map with sequences of 
Arabidopsis and B. rapa genomes (Shirasawa et al. 2011). 
Very recently, comparing the whole-genome sequence of 
B. rapa with genome sequences or genetic maps of other 
crucifer species, the conserved genomic block boundaries 
were re -defined for seven ancestral karyotype blocks, deci- 
phering the diploid ancestral genome of mesohexaploid 
B. rapa (Cheng et al. 2013). 

Functional genomic regions and candidate genes have 
also been identified by a comparative mapping approach in 
Brassicas (Schmidt and Bancroft 2011). Mapping and iden- 
tification of candidate genes in B. rapa by comparative 
genomic study have been reported: for example, cloning of 
flowering time FLC genes (Schranz et al. 2002), clubroot 
resistance genes syntenic to the Arabidopsis chromosome 
(Suwabe et al. 2006), and mapping QTLs for flowering time 
(Li et al. 2009). QTLs for clubroot resistance were identi- 
fied in B. oleracea and comparative analysis of resistance 
genes was performed between B. rapa and B. oleracea 
(Nagaoka et al. 2010). In B. juncea, comparative mapping 
of QTL regions for aliphatic glucosinolate with the corre- 
sponding Arabidopsis sequence identified candidate genes 
regulating the aliphatic glucosinolate biosynthetic pathway 
(Bisht et al. 2009). In B. olercaea, candidate genes for male 
fertility were identified by comparing sequence-tagged 
markers with genome sequences of Arabidopsis and B. rapa 
(Ashutosh et al. 2012). As the B. napus genome sequence is 
not available, sequences of diploid progenitors B. rapa and 
B. oleracea were utilized in comparative mapping with 
Arabidopsis to identify candidate genes of QTLs for seed 
weight in B. napus (Cai et al. 2012). Li et al. (2013) have 
identified five major functional conserved genomic regions 
containing QTLs for morphological and yield traits between 
A, B, C subgenomes of B. rapa, B. juncea and B. napus. 
The knowledge gained from comparative analysis has re- 
vealed high-level sequence collinearity across Brassicaceae 
and helps in understanding genome evolution and poly- 
ploidization. Comparative genomic studies give confidence 



in identifying orthologous candidate genes for important 
agronomic traits in Brassica crops and help in generating an 
integrated linkage map of species. 

Next-generation genotyping techniques in Brassica 

Recent advances of NGS technology have facilitated the 
discovery of various approaches of simultaneous sequence 
variant analysis and genotyping. Selected genomic regions 
or targeted restriction fragments of pooled individuals can 
be sequenced in a single reaction of a massive parallel se- 
quencing platform. The sequences are aligned to the refer- 
ence genome to compare assembled individual sequences 
and to identify variant sites to discover SNPs. These ap- 
proaches are cost-effective and highly efficient in generat- 
ing large amounts of informative data. Different protocols 
are available, such as complexity reduction of polymorphic 
sequences (CRoPS, Van Orsouw et al. 2007), restriction- 
associated DNA sequencing (RADseq, Baird et al. 2008), 
genotyping by sequencing (GBS, Elshire et al. 2011), and 
diversity arrays technology (DArT, Jaccoud et al. 2001). 
Each of the above protocols has its advantages and limita- 
tions but is reliable in SNP discovery and genotyping. Com- 
parisons of these protocols have been explained in various 
reviews (Davey et al. 2011, Nielsen et al. 2011). These 
genotyping methods have been explored only in a few 
Brassica species, although transcriptome sequence tech- 
niques have been used for SNP discovery in B. napus and 
B. rapa (Hu et al. 2012, Trick et al. 2009a, 2009b). 

The RADseq technique used in B. napus identified more 
than 20,000 SNPs and simultaneously genotyped eight dif- 
ferent inbred lines (Bus et al. 2012). This method is simple, 
cost-effective, efficient and an alternative to transcriptome 
sequencing in SNP genotyping. DArT markers developed in 
Brassica and related species have been used in molecular 
diversity analysis of 89 different accessions of B. napus, 
B. rapa, B. juncea and B. carinata (Raman et al. 2012). Re- 
cently, a consensus linkage map based on DArT markers 
has been developed in B. napus, consisting of 1,359 markers 
spanning all 19 chromosomes covering a total of 1,987.2 cM 
with an average map density of one marker per 1 .46 cM 
(Raman et al. 2013). Most of the DArT markers sequenced 
and aligned with B. rapa and B. oleracea genomes are use- 
ful in comparative mapping and genome evolution studies. 
Wells et al. (2013) developed a methodology based on 
pooled PCR product sequencing that incorporates bar-coded 
amplification tags (BATs) into PCR products. Using this 
method, targeted gene sequences were screened in a 
B. napus population and the resulting allele scoring mapped 
24 markers on the expected position of the B. napus linkage 
map. In summary, next-generation high-throughput geno- 
typing techniques are capable of providing increased mark- 
er density for genome selection or genome -wide association 
studies. Furthermore, next-generation genotyping methods 
need to be explored in Brassica and Raphanus to generate 
high-density linkage maps. 
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EST sequences and transcriptomes 

In Brassicaceae, a large amount of data of EST sequences 
have been generated from all major Brassica crops and 
related species. In the NCBI database, approximately 
1,500,000 EST sequences are available from various tissues 
exposed to different stress/growth conditions often differ- 
ent species of Brassicaceae (excluding Arabidopsis). Love 
et al. (2005) initiated development of a Brassica microarray 
by assembling EST sequences. Brassica 95K EST micro- 
array was developed through clustering and assembling 
810,254 Brassica raw ESTs, available in 2007, in non- 
redundant unigenes (Trick et al. 2009a). These unigenes 
have been loaded on a web portal (http://barssica.bbsrc.ac. 
uk) and are a valuable source for comparative mapping and 
genome analysis. Unigene sets are useable as pseudo- 
reference sequences for re-sequencing projects by next- 
generation techniques and assembling transcriptomes (Trick 
et al. 2009b) and SNP chips (Hayward et al. 2012). 

In radish, rich EST sequences are available in the public 
domain and are being used for generation of linkage maps 
and genetic studies. In total, 3,800 EST-SSR markers were 
developed from 26,606 ESTs derived from different tissues 
of R. sativus and a linkage map was constructed using 630 
EST-SSR and 213 reported marker loci covering a 
1,129.3 cM region (Shirasawa et al. 2011). Available radish 
ESTs in databases (Radish DB; http://radish.plantbiology. 
msu.edu) were explored to discover SNPs and to construct 
an R. sativus linkage map consisting of 726 markers, and 72 
syntenic regions to Arabidopsis were identified (Li, F. et al. 
2011). RNA-seq-based transcriptome profiling of Raphanus 
root was utilized for identification of genes in response to 
metal Pb stress and 22 genes have been validated by quanti- 
tative real-time PCR (Wang et al. 2013). In the latest updat- 
ed information, a total of 311,799 high-quality EST se- 
quences were generated from raw EST data and further 
assembled in 85,083 unigenes (RadishBase, bioinfo.bti. 
cornell.edu/radish) (Shen et al. 2013). 

In recent years, with the advancement of NGS technolo- 
gy, it has become possible to economically re-sequence 
whole genomes or generate large amount of transcriptome 
data in a short time. These sequences have been utilized for 
variant analysis to develop genie and functional markers. 
NGS techniques have been utilized to generate transcrip- 
tome sequences in polyploid B. napus and to discover single 
nucleotide polymorphism (Hu et al. 2012, Trick et al. 
2009b). Furthermore, the B. napus genome was dissected by 
transcriptome sequences of parental and mapping popula- 
tion leaf samples and an SNP linkage map of about 23,000 
markers was constructed (Bancroft et al. 2011). Sequence 
comparison of the B. napus genome with its progenitors 
B. rapa and B. oleracea revealed genome re-arrangements 
and detected a track of genomic segment inheritance. Tran- 
scriptome profiling based on deep EST sequencing in 
B. napus and three other oilseed species revealed both con- 
served and distinct species-specific expression patterns for 



genes involved in the synthesis of glycerolipids and their 
precursors (Troncoso-Ponce et al. 2011). Higgins et al. 
(2012) employed a next-generation-based RNA-seq tech- 
nique to discriminate A and C genome transcriptomes in 
amphidiploid B. napus and measured the contribution of 
gene expression by each genome. The associative transcrip- 
tomics approach has been explored in B. napus to identify 
genomic deletions in QTL regions of glucosinolate content 
of seeds (Harper et al. 2012). A different gene expression 
pattern in response to water logging has also been identified 
in B. napus roots at the seedling stage (Zou, X. et al. 2013). 

In Brassica rapa, abiotic stress transcriptome studies 
identified 56 transcription factors and 60 genes commonly 
expressed under various stresses (Lee et al. 2008). The gene 
expression pattern in different tissues of B. rapa analysed 
by RNA-seq revealed transcriptome complexity (Tong et al. 
2013). Recently, in B. juncea (Tumourous stem mustard), 
transcription level analysis was performed to detect gene 
expression patterns at various stem development stages 
(Sun et al. 2012). In brief, the advancement of transcriptom- 
ic studies helps to understand the complexity of gene ex- 
pression and regulation networks at various developmental 
stages and the response to biotic/abiotic stresses; in addi- 
tion, high-resolution genome dissection provides resources 
for comparative and functional genomics. 

Genome sequencing of the Brassicaceae family 

Recent advances in high-throughput sequencing technology 
have immensely benefitted whole-genome sequencing pro- 
jects in non-model organisms and have opened a new era in 
comparative genomics. In the Brassicaceae family, to date, 
the genomes of ten species have been partially or complete- 
ly sequenced, including the model plant A. thaliana and 
cultivated Brassica species, e.g., B. rapa and B. oleracea, 
summarized in Table 1 . The annotated Arabidopsis genome 
sequence provides a valuable reference, and genome se- 
quences have also been utilized to develop DNA markers 
and a number of informative linkage maps in cultivated 
Brassica species, which have been used to identify candi- 
date genes. Most of the ancestral progenitor sequences have 
been used for genome evolution studies and identification of 
conserved ancestral genomic segments. The sequencing 
project of 1,001 accessions of A. thaliana will enable the 
study of genome-wide association in this species 
(http://1001genomes.org). These sequences will provide a 
link of phenotypic diversity with genome variation and gen- 
erate large resources for the plant community. 

Sequencing of Brassica genomes are required for finding 
important genes, understanding of genome evolution and 
improvement of crops. Considering this, and the importance 
of Brassica crops, sequencing of the Brassica genomes was 
initiated in 2002 by the Multinational Brassica Genome 
Project (MBGP). The B. rapa Chinese cabbage (cv. Chiifu- 
401) was the first genome selected for sequencing because 
of its small genome size (529 Mb) and low frequency of 
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Table 1. Summary of genome sequences completed in Brassicaceae species 



Species 


Genome size" 
(Mb) 


Number of 
predicted genes 


% of genes ortholo- 
gous to A. thaliana 


References 


Arabiaopsis thaliana 


135—157 


28,710 


1 00 


... 

Arabidopsis Genome Initiative 2000 


Arabidopsis lyrata 


Z J\J— Z^tJ 


97 11Q 

Z / , J 1 J 


yz 


riu ei at. zu 1 1 


Schrenkiella parvida 


140 


28,901 


80.2 


Dassanayake et al. 2011 


Brassica rapa 


529 


41,174 


78.2 


Wang, X. et al. 2011 


Capsella rubella 


210-216 


26,521 


88 


Slotte etal. 2013 


Eutrema salsugineum 


314 


26,521 


82.7 


Yang et al. 2013 


Leavenworthia alabamica 


316 


30,343 


67.7 


Haudry etal. 2013 


Sisymbrium irio 


262 


28,917 


82.9 


Haudry etal. 2013 


Aethionema arabicum 


240 


23,167 


72.4 


Haudry et al. 2013 


Brassica oleracea 


696 


45,758 




Yuet al. 2013 



" Genome size reference from Johnston et al. (2005) or adapted from Haudry et al. 2013. 



repetitive sequences. The draft genome sequence of B. rapa 
(A genome) has been published by the Brassica rapa 
Genome Sequencing Project (BrGSP) (Wang, X. etal. 2011). 
A total of 41,174 protein-encoding genes were modeled on 
the B. rapa genome by assembling 1,427 markers, and 10 
pseudochromosomes have been produced. The genome se- 
quence of another important vegetable crop, B. oleracea 
(C genome), has recently been completed using a whole- 
genome shotgun (WGS) sequencing strategy. A 630 Mb as- 
sembled draft genome sequence was obtained, with a scaf- 
fold N50 size of 1.457 Mb and contig size of 26.828 kb, and 
assigned to nine pseudochromosomes containing 45,758 
predicted genes (Yu, J. et al. 2013, http://www.ocri-genomics. 
org/bolbase/index.html). 

Recently, Haudry et al. (2013) sequenced three genomes 
of Brassicaceae species, i.e., Leavenworthia alabamica, 
Sisymbrium irio and Aethionema arabicum. Comparative 
analysis with the previously sequenced genomes identified 
90,000 conserved noncoding sequences (CNS) in Brassi- 
caceae that show evidence of transcriptional and post- 
transcriptional regulation. Currently, work is in progress to 
complete the genome sequences of other Brassica and 
Raphanus species in the near future. 

Genomic resources 

Sequencing technology advancement has produced vast 
genomic information and sequence data in major crops. In 
the past decade, various Brassica databases were integrated 
on a common platform to facilitate efficient utilization by 
diverse researchers. An open access integrated database 
provides annotated genome information, genetic and physi- 
cal maps, molecular markers, reference maps and gene ex- 
pression data. The UK Brassica community put initiative in 
this direction in 1996 by compiling Brassica sequences and 
genetic maps to create the BrassicaDB database. A major 
advance in knowledge sharing realized the initiation of the 
Multinational Brassica Genome Project (MBGP) in 2002. A 
number of open access databases are available in Brassi- 
caceae with systematics information on linkage maps, QTL 
maps, details of mapping populations, BAC libraries, mark- 
er data, EST repositories and genome sequences. The anno- 



tated B. rapa genome sequence is available on BRAD Bras- 
sica database (IVF-CAAS, China) and Brass ensemble 
(Rothemsted Research, UK) web resources. Recently, the 
B. oleracea genome sequence has become available for 
comparative analysis on the Bolbase data source (http:// 
www.ocri-genomics.org/bolbase/index.html), although the 
complete genome for download is yet to be released. Radish 
Base, a database of genetics and genomics of radish, was 
recently developed by Cornell University, USA, and con- 
sists of SSR, EST, and SNP marker information, linkage 
maps and organelle genome sequences. Currently, many ge- 
netic and genomic resources in Brassicaceae are available 
and are summarized in Table 2. The integrated knowledge 
available in the public domain will provide a platform to 
exchange information and a basis for crop Brassica 
enhancement. 

Conclusion and perspectives 

Since the accomplishment of genomic sequencing of the 
model plant A. thaliana, and later B. rapa and B. oleracea, 
comparative mapping between these species and important 
Brassicaceae crops has been possible. The presence of du- 
plicated and repetitive DNAs complicates the proper align- 
ment and identification of actual causal genes out of many 
paralogs. Genome sequence information on the other four 
cultivated Brassica genomes is still not available. Since the 
sequences of Brassica species are highly conserved, molec- 
ular markers and genomic information obtained for exten- 
sively studied B. rapa, B. oleracea and B. napus could be 
transferred to other commercial Brassica crops. Construc- 
tion of high-density consensus genetic maps, common 
marker systems, and genomic sequence information is of 
great significance for accelerating breeding progress, as it 
allows comparative QTL mapping analysis, marker-assisted 
selection and cloning of economically important genes for 
desired traits. 

Although many genomic resources have been estab- 
lished, and genomic sequencing of several Brassicaceae 
crops has been finished, most comparative studies are on a 
structural genomics level, and only a handful of genes gov- 
erning important traits have been identified and functionally 
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Table 2. List of web resources and databases providing bioinformatics analysis and genomic resources for the Brassicaceae 
Names URL Key contents 



ACPFG Applied 
Bioinformatics group 

Bolbase 



BRAD 



BrassEnsembl 

Brassica Genome 
Gateway 

Brassica. info 



BrassicaDB 



CropStoreDB 



Radish Base 



Radish database 



http://www.appliedbioinformatics.com.au/index.php/ 
Main_Page http://www.brassicagenome.net/ 

http://www.ocri-genomics. org/bolbase/ 



http://brassicadb.org/brad/ 



http://www.brassica.info/ BrassEnsembl/index.html 



http://brassica.nbi.ac.uk 



http://www.brassica.info/ 



http://brassica.nbi.ac.uk/ BrassicaDB/ 



http ://www. cropstoredb . org 



http://bioinfo.bti.cornell.edu/cgi-bin/radish/index.cgi 



http://radish.plantbiology.msu.edu/index.php/Main_Page 



B. rapa genome browser, EST-SNP data base, BrassicaDB, 
CMap to compare genome and genetic map 

Genomic data of B. oleracea, analysis of genome structure 
as well as syntenic regions, browse, search and download 
genome of B. rapa and A thaliana 

Compilation of sequence datasets including the complete 
sequence of B. rapa. Annotations of genes orthologous to 
those in A. thaliana, and genetic markers and genetic maps, 
BLAST server 

B. rapa genome sequence, consensus integrated genetic 
maps of the Brassica A and C genomes 

Brassica genome sequencing database, Brassica 95K uni- 
gene set, the Brassica IGF Project, BrassicaDB 

Web-based open source to exchange information relating to 
Brassica genomics and genetics, registries of reference 
datasets, nomenclature standards, a compilation of ongoing 
public domain genome sequencing 

Comprehensive sequence data set, genetic maps and markers 
in Brassica species, BLAST server, physical maps 

A collection of datasets related to plant and crop genetics, 
Brassica data implemented 

Assembled and annotated ESTs, predicted metabolic path- 
ways, EST-SSR, SNP markers, and genetic maps 

EST sequences, linkage maps, SNP and SSR markers, radish 
genome sequence updates 



characterized. Thus, the genomic and comparative genomic 
resources that are being established are only a starting point 
for exploring the variation within Brassicaceae. In the near 
future, functional genomics should increasingly be used to 
identify desired genes for directed gene-assisted selection of 
economically important traits, and to detect genetic variation 
within the species, by combining various techniques, such 
as transcriptomic analysis and high-throughput genotyping 
and phenotypic characterization, to study the expression of 
duplicated genes under different environmental conditions. 
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